[WIP][SPARK-54571][CORE] Use LZ4 safeDecompressor#53290
[WIP][SPARK-54571][CORE] Use LZ4 safeDecompressor#53290dbtsai wants to merge 1 commit intoapache:masterfrom
Conversation
|
It's more involving than I thought as LZ4BlockInputStream doesn't take safeDecompressor. I will take a deeper look tomorrow. |
|
It seems that the fix for CVE‐2025‐12183 wasn't implemented until version 1.8.1, but Spark is still using version 1.8.0. |
|
Note that LZ4BlockInputStream does not support safeDecompressor in lz4-java 1.8.1. If you upgrade to that version, it will still work and be secure, but performance will be much worse than in 1.8.0. lz4-java 1.10.0 introduces a new builder for LZ4BlockInputStream that accepts a safeDecompressor. |
|
It is published, but only under the new group id |
|
@yawkat, which group id does 1.10.0 publish? |
|
we need change to use |
|
Thank you for the updated info, @yawkat , @LuciferYang . Could you update this PR, @dbtsai ? |
|
To all, in order to help this PR, I made an independent PR for dependency upgrade. |
|
We upgraded to |
|
I recommend you wait a few hours with releasing this. Another (smaller, unrelated) CVE has been found in lz4-java. |
|
CVE-2025-66566 has been published and fixed in 1.10.1. I suggest you move to that version. Though cloudflare seems to be having some trouble that breaks maven central at the moment. |
Just FYI, we have no intention to hurry this, @yawkat . To be safe, this will be tested in That's the main reason why LZ4 1.10.0 PR is only in |
|
Gentle ping once more, @dbtsai . |
|
Gentle ping, @dbtsai . |
|
The PR description mentions that |
|
I am not aware of spark benchmarks, but as of 1.8.1, safeDecompressor is substantially faster than fastDecompressor. In earlier versions, the difference was minor. |
|
@yawkat, how about the performance for 1.10.0? @dongjoon-hyun has already bumped to 1.10.0. |
|
Performance between 1.8.1 and 1.10.1 has not changed substantially. |
|
@mridulm I found a perf report at yawkat/lz4-java#3 (comment), but without providing the data. |
|
@mridulm @yawkat @dbtsai @dongjoon-hyun @SteNicholas I created #53453 to add an lz4 benchmark based on TPCDS |
|
The underlying lz4 library was updated in 1.9.0 so a performance difference is possible. |
|
While this is being merged, would you consider "setting As far as I saw, only this config key has LZ4 as default. Also, it seems like |
|
@Dzeri96, setting |


What changes were proposed in this pull request?
In recent LZ4 versions, safeDecompressor has become highly optimized and can be as fast, or even sometimes faster, than fasterDecompressor. So it does make sense to switch to safeDecompressor.
Why are the changes needed?
It is recommended to switch to .safeDecompressor(), which is not vulnerable and provides better performance per https://sites.google.com/sonatype.com/vulnerabilities/cve-2025-12183
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit tests
Was this patch authored or co-authored using generative AI tooling?
No